Study Reveals AI Models Vulnerable to Poisoning with Minimal Malicious Data
New research demonstrates that AI models can be compromised with as few as 250 poisoned documents, regardless of model size. The study, conducted by a consortium including Anthropic and the UK AI Security Institute, overturns the assumption that data poisoning requires control over a significant percentage of a training dataset. Attack success hinges solely on the number of malicious samples injected during training.
Models ranging from 600 million to 13 billion parameters proved equally susceptible. Even when trained on billions of clean data points, these backdoors persisted—though clean retraining could partially mitigate the damage. The findings highlight critical vulnerabilities in systems relying on public web scraping for training data.